How to generate text: using different decoding methods for language generation with Transformers

Greedy Search

It selects the word with the highest probability as its next word: w_t = argmax_w P(w|w_1:t-1) at each timestamp t

https://huggingface.co/blog/assets/02_how-to-generate/greedy_search.png

生起確率が一番高い次のトークンを逐一選んでいく

（決定的だから）同じ文章を繰り返してしまう

Beam search

https://huggingface.co/blog/assets/02_how-to-generate/beam_search.png

直近だけでなく、先の単語まで見て、高い系列となるトークン系列を見つける

ビームサイズ

Sampling

ランダム（上2つ違って決定的でない）

Top-p (nucleus) sampling

上位p%の生成確率のトークンから

Top-K Sampling

上位k個の生成確率のトークンから

temperature

0以上の実数をsoftmax手前のlogitsの分母に掛ける

大きいと一様分布に近づく